Coming from Peru to Scotland was not only a cultural shock but an
environmental one, if you will. I had never experienced such strikingly
beautiful but equally emotionally debilitating winters. In Scotland,
daylight is precious despite its scarcity in Scottish winters. Here is
when I first heard the term ‘Seasonal depression’ being used the most at
this time of the year, between the end of Autumn and the cusp of Winter;
whereas it’s a passing comment from a friend, a sitcom joke, or a
genuine Google search during those long, dark, and cold winter days when
you feel extra homesick.
Throughout my time in Scotland I have wondered if seasonal depression is
more a myth than a reality but have never quite gathered more than
qualitative data in the form of conversations with friends and
acquaintances. Therefore, I have decided to quantify these verbal leads
through this report.
The research question to be explored is:
To what extent do seasonal variations relating to daylight hours and temperature affect antidepressant prescribing rates across Scottish NHS Health Boards?
The data time frame to be used is from March 2024 to March 2025 to assess the potential cyclical nature of results and make adequate comparisons between seasons.
Spring: March, April, May
Summer: June, July, August
Autumn: September, October, November
Winter: December, January, February
This categorisation has been made according to the Met Office’s territorial delineation of Scotland based on the distribution of climate measurements as available in their website.
Northern Scotland: NHS Highland, NHS Western Isles,
NHS Orkney, NHS Shetland
Eastern Scotland: NHS Borders, NHS Lothian, NHS Fife,
NHS Tayside, NHS Grampian, NHS Forth Valley
Western Scotland: NHS Ayrshire and Arran, NHS Dumfries
and Galloway, NHS Greater Glasgow and Clyde, NHS Lanarkshire
Raw data from multiple institutional websites include all medications prescribed by NHS Health Boards. To differentiate and include only Antidepressant prescriptions, only prescriptions with a BNF item code starting with ‘0403’ will be used.
This report aims to be a multi-factorial exploration of the environmental and social dimensions of antidepressant prescription rates in Scotland given that it’s analysis will focus on three distinct aspects as shown in the hypotheses.
This report puts forward an overarching hypothesis:
Null (H0): Antidepressant prescription rates do not have seasonal patterns.
Alternative (Ha): Antidepressant prescription rates have seasonal patterns.
Where within Ha, the following is proposed:
1a. Prescription rates increase during shorter months with shorter daylight hours and colder temperatures (autumn and winter) and decrease during those with longer daylight hours and warmer temperature (spring and summer).
In the assumption that Ha is true, this report proposed two further sub-hypotheses that further enrich the multi-factorial lens through which seasonal effects inlfuence prescription rates:
I. The latitude of prescribing NHS Health Boards further influences the seasonality of antidepressant prescription rates, the more northern a health board is, the heavier the impact of these seasonal patterns.
II. Areas with a higher socioeconomic deprivation index (SMID) will have a consistently higher benchmark antidepressant use regardless of season, making the impact of varying daylight hours and temperature be even greater.
January to June 2024: https://www.opendata.nhs.scot/dataset/84393984-14e9-4b0d-a797-b288db64d088/resource/f0df380b-3f9b-4536-bb87-569e189b727a/download/hb_pitc2024_01_06-1.csv
July to December 2024: https://www.opendata.nhs.scot/dataset/84393984-14e9-4b0d-a797-b288db64d088/resource/f3b9f2e2-66c0-4310-9b8e-734781d2ed0a/download/hb_pitc2024_07_12-1.csv
January to June 2025: https://www.opendata.nhs.scot/dataset/84393984-14e9-4b0d-a797-b288db64d088/resource/9de908b3-9c28-4cc3-aa32-72350a0579d1/download/hb_pitc2025_01_06.csv
North Scotland:
Daylight hrs - https://www.metoffice.gov.uk/pub/data/weather/uk/climate/datasets/Sunshine/ranked/Scotland_N.txt
Temperature - https://www.metoffice.gov.uk/pub/data/weather/uk/climate/datasets/Tmean/date/Scotland_N.txt
East Scotland:
Daylight hrs - https://www.metoffice.gov.uk/pub/data/weather/uk/climate/datasets/Sunshine/ranked/Scotland_E.txt
Temperature - https://www.metoffice.gov.uk/pub/data/weather/uk/climate/datasets/Tmean/date/Scotland_E.txt
West Scotland:
Daylight hrs - https://www.metoffice.gov.uk/pub/data/weather/uk/climate/datasets/Sunshine/ranked/Scotland_W.txt
Temperature - https://www.metoffice.gov.uk/pub/data/weather/uk/climate/datasets/Tmean/date/Scotland_W.txt
Taken from the Public Health Scotland website:
https://www.opendata.nhs.scot/dataset/78d41fa9-1a62-4f7b-9edb-3e8522a93378/resource/acade396-8430-4b34-895a-b3e757fa346e/download/simd2020v2_22062020.csv
Also taken from the Public Health Scotland website:
https://www.opendata.nhs.scot/dataset/e3300e98-cdd2-4f4e-a24e-06ee14fcc66c/resource/cec9341e-ccba-4c71-afe4-a614f5e97b9f/download/practice_listsizes_oct2024-open-data.csv
1.1 Load all the required libraries on RStudio
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.2
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(here) # for the upkeep of the directory structure
## here() starts at /Users/florenciasolorzano/Documents/data_science/B218332
library(janitor) # for data cleaning
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(gt) # for table building
1.2 Load all the Health Board Data (January 2024:June 2025) into
RStudio and use the janitor package by using the
clean_names() function to have uniform names throughout
datasets
prescr_jan_june_2024 <- read_csv("https://www.opendata.nhs.scot/dataset/84393984-14e9-4b0d-a797-b288db64d088/resource/f0df380b-3f9b-4536-bb87-569e189b727a/download/hb_pitc2024_01_06-1.csv") %>%
clean_names()
## Rows: 763324 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): HBT, BNFItemCode, BNFItemDescription, PrescribedType
## dbl (5): DMDCode, NumberOfPaidItems, PaidQuantity, GrossIngredientCost, Paid...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
prescr_july_dec_2024 <- read_csv("https://www.opendata.nhs.scot/dataset/84393984-14e9-4b0d-a797-b288db64d088/resource/f3b9f2e2-66c0-4310-9b8e-734781d2ed0a/download/hb_pitc2024_07_12-1.csv") %>%
clean_names()
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
## dat <- vroom(...)
## problems(dat)
## Rows: 752709 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): HBT, BNFItemCode, BNFItemDescription, PrescribedType
## dbl (5): DMDCode, NumberOfPaidItems, PaidQuantity, GrossIngredientCost, Paid...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
prescr_jan_june_2025 <- read_csv("https://www.opendata.nhs.scot/dataset/84393984-14e9-4b0d-a797-b288db64d088/resource/9de908b3-9c28-4cc3-aa32-72350a0579d1/download/hb_pitc2025_01_06.csv") %>%
clean_names()
## Rows: 733347 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): HBT, BNFItemCode, BNFItemDescription, PrescribedType
## dbl (5): DMDCode, NumberOfPaidItems, PaidQuantity, GrossIngredientCost, Paid...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
1.3 Now filter out by “bnf_item_code” to keep only those prescriptions that are antidepressants (codes starting with ‘0403’) and that were prescribed within our selected time frame (March 2024 - March 2025). Make sure to maintain the following: hbt, bnf_item_description, number_of_paid_items, and paid_date_month. Repeat for all Health Board datasets.
antidepr_march_june_2024 <- prescr_jan_june_2024 %>%
filter(str_detect(bnf_item_code, "^0403")) %>%
filter(paid_date_month != "202401" & paid_date_month != "202402") %>%
select(c(hbt,bnf_item_code,bnf_item_description,number_of_paid_items,paid_date_month)) %>%
group_by(hbt, paid_date_month) %>%
summarise(number_of_items = sum(number_of_paid_items, na.rm = TRUE)) %>%
arrange(paid_date_month)
## `summarise()` has grouped output by 'hbt'. You can override using the `.groups`
## argument.
antidepr_july_dec_2024 <- prescr_july_dec_2024 %>%
filter(str_detect(bnf_item_code, "^0403")) %>%
select(c(hbt,bnf_item_code,bnf_item_description,number_of_paid_items,paid_date_month)) %>%
group_by(hbt, paid_date_month) %>%
summarise(number_of_items = sum(number_of_paid_items, na.rm = TRUE)) %>%
arrange(paid_date_month)
## `summarise()` has grouped output by 'hbt'. You can override using the `.groups`
## argument.
antidepr_jan_feb_2025 <- prescr_jan_june_2025 %>%
filter(str_detect(bnf_item_code, "^0403")) %>%
filter(paid_date_month != "202503" & paid_date_month != "202504" & paid_date_month != "202505" & paid_date_month != "202506") %>%
select(c(hbt,bnf_item_code,bnf_item_description,number_of_paid_items,paid_date_month)) %>%
group_by(hbt, paid_date_month) %>%
summarise(number_of_items = sum(number_of_paid_items, na.rm = TRUE)) %>%
arrange(paid_date_month)
## `summarise()` has grouped output by 'hbt'. You can override using the `.groups`
## argument.
1.4 Join the three antidepressant datasets to create a unified March 2024-March 2025 prescription data object. Then, create an object for seasonal categorisations and merge it with the unified object created.
spr_months_202425 <- c(202403, 202404, 202405)
sum_months_202425 <- c(202406, 202407, 202408)
aut_months_202425 <- c(202409, 202410, 202411)
win_months_202425 <- c(202412, 202501, 202502)
seasons_202425 <- tibble(
month = c(spr_months_202425, sum_months_202425, aut_months_202425, win_months_202425),
season = c(rep("spr", length(spr_months_202425)),
rep("sum", length(sum_months_202425)),
rep("aut", length(aut_months_202425)),
rep("win", length(win_months_202425))))
yearly_antidepr_data <- antidepr_march_june_2024 %>%
full_join(antidepr_july_dec_2024)
## Joining with `by = join_by(hbt, paid_date_month, number_of_items)`
yearly_antidepr_data <- yearly_antidepr_data %>%
full_join(antidepr_jan_feb_2025) %>%
full_join(seasons_202425, join_by(paid_date_month == month))
## Joining with `by = join_by(hbt, paid_date_month, number_of_items)`
1.5 Create an object for geographical NHS Health Board names. Make
each health board be geographically named to match Met Office data later
on. Then, full_join() the dataset with the
yearly_antidepr_data object created in the previous step.
north_hb <- c("NHS Highland", "NHS Western Isles", "NHS Orkney", "NHS Shetland")
east_hb <- c("NHS Borders", "NHS Lothian", "NHS Fife", "NHS Tayside", "NHS Grampian", "NHS Forth Valley")
west_hb <- c("NHS Ayrshire and Arran", "NHS Dumfries and Galloway", "NHS Greater Glasgow and Clyde", "NHS Lanarkshire")
metoffice_hb_region <- tibble(
hb_name = c(north_hb, east_hb, west_hb),
region = c(rep("north", length(north_hb)),
rep("east", length(east_hb)),
rep("west", length(west_hb))))
hb_names <- read_csv("https://www.opendata.nhs.scot/dataset/9f942fdb-e59e-44f5-b534-d6e17229cc7b/resource/652ff726-e676-4a20-abda-435b98dd7bdc/download/hb14_hb19.csv") %>%
clean_names() %>%
full_join(metoffice_hb_region)
## Rows: 18 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): HB, HBName, Country
## dbl (2): HBDateEnacted, HBDateArchived
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Joining with `by = join_by(hb_name)`
hbt_antidepr_yearly <- yearly_antidepr_data %>%
full_join(hb_names, join_by(hbt == hb)) %>%
select(-c(hb_date_archived, hb_date_archived, hb_date_enacted, country))
1.6 Notes at this stage:
full_join() there are 4 rows with NA in
prescription data (paid_date_month and number_or_items). Upon further
investigation and a look at the object hb_names, these were shown to
have had their hbt numbers archived in 2018 and 2019, making them easily
removable from our data.clean_hbt_antidepr_yearly <- hbt_antidepr_yearly %>%
filter(!is.na(hb_name)) %>%
filter(!is.na(paid_date_month))
1.7 Load, wrangle and organise NHS Health Board population data from the data file from October 2024. The date chosen is deliberate as October 2024 marks approximately half-way through the determined time period that this investigation is focusing on.
hb_pop_oct_2024 <- read_csv("https://www.opendata.nhs.scot/dataset/e3300e98-cdd2-4f4e-a24e-06ee14fcc66c/resource/cec9341e-ccba-4c71-afe4-a614f5e97b9f/download/practice_listsizes_oct2024-open-data.csv") %>%
clean_names()
## Rows: 2664 Columns: 24
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (13): HB, HSCP, Sex, SexQF, AllAgesQF, Ages0To4QF, Ages5To14QF, Ages15To...
## dbl (11): Date, PracticeCode, AllAges, Ages0to4, Ages5to14, Ages15to24, Ages...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
clean_hb_population <- hb_pop_oct_2024 %>%
select(hb, sex, all_ages) %>%
filter(!sex %in% c("Male", "Female")) %>%
group_by(hb) %>%
summarise(hb_population = sum(all_ages))
1.8 Introduce population data for each NHS health board.
antidepr_yearly_hb_pop <- clean_hbt_antidepr_yearly %>%
full_join(clean_hb_population, join_by(hbt == hb))
1.9 Now let’s introduce the daylight hours data from the UK Met Office. Given the nature of the .txt files, I converted them into .csv files using Excel. These files can be found in the ‘data’ folder attached.
daylight_north_scot <- read_table(here("docs","data", "R_north_scotland_sunshine.csv"))
## Warning: Duplicated column names deduplicated: 'year' => 'year_1' [4], 'year'
## => 'year_2' [6], 'year' => 'year_3' [8], 'year' => 'year_4' [10], 'year' =>
## 'year_5' [12], 'year' => 'year_6' [14], 'year' => 'year_7' [16], 'year' =>
## 'year_8' [18], 'year' => 'year_9' [20], 'year' => 'year_10' [22], 'year' =>
## 'year_11' [24], 'year' => 'year_12' [26], 'year' => 'year_13' [28], 'year' =>
## 'year_14' [30], 'year' => 'year_15' [32], 'year' => 'year_16' [34]
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## .default = col_double(),
## nov = col_character(),
## dec = col_character(),
## win = col_character(),
## aut = col_character(),
## ann = col_character()
## )
## ℹ Use `spec()` for the full column specifications.
daylight_east_scot <- read_table(here("docs","data", "R_east_scotland_sunshine.csv"))
## Warning: Duplicated column names deduplicated: 'year' => 'year_1' [4], 'year'
## => 'year_2' [6], 'year' => 'year_3' [8], 'year' => 'year_4' [10], 'year' =>
## 'year_5' [12], 'year' => 'year_6' [14], 'year' => 'year_7' [16], 'year' =>
## 'year_8' [18], 'year' => 'year_9' [20], 'year' => 'year_10' [22], 'year' =>
## 'year_11' [24], 'year' => 'year_12' [26], 'year' => 'year_13' [28], 'year' =>
## 'year_14' [30], 'year' => 'year_15' [32], 'year' => 'year_16' [34]
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## .default = col_double(),
## nov = col_character(),
## dec = col_character(),
## win = col_character(),
## aut = col_character(),
## ann = col_character()
## )
## ℹ Use `spec()` for the full column specifications.
daylight_west_scot <- read_table(here("docs","data","R_west_scotland_sunshine.csv"))
## Warning: Duplicated column names deduplicated: 'year' => 'year_1' [4], 'year'
## => 'year_2' [6], 'year' => 'year_3' [8], 'year' => 'year_4' [10], 'year' =>
## 'year_5' [12], 'year' => 'year_6' [14], 'year' => 'year_7' [16], 'year' =>
## 'year_8' [18], 'year' => 'year_9' [20], 'year' => 'year_10' [22], 'year' =>
## 'year_11' [24], 'year' => 'year_12' [26], 'year' => 'year_13' [28], 'year' =>
## 'year_14' [30], 'year' => 'year_15' [32], 'year' => 'year_16' [34]
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## .default = col_double(),
## nov = col_character(),
## dec = col_character(),
## win = col_character(),
## aut = col_character(),
## ann = col_character()
## )
## ℹ Use `spec()` for the full column specifications.
1.10 Upon inspection of the data, one can notice that there are specific columns for each season apart from one for each month. This report will be using the seasonal data to make the merging processes easier. Now, each daylight hours object will be cleaned and wrangled to only have March 2024-March 2025 data (March 1st 2024 to February 28th, 2025). The final object will be named all_seasons_daylight, which will contain all daylight hours per season per region for the March 2024-March 2025 analysis time period.
# North of Scotland
winter_daylight_north_2024_25 <- daylight_north_scot %>%
select(win,year_12) %>%
filter(year_12 == "2024") %>% # December 2024, January 2025, and February 2025 are wanted
rename(year = year_12)
spr_daylight_north_2024_25 <- daylight_north_scot %>%
select(spr,year_13) %>%
filter(year_13 == "2024") %>% # March 2024, April 2024, and May 2024 are wanted
rename(year = year_13)
sum_daylight_north_2024_25 <- daylight_north_scot %>%
select(sum,year_14) %>%
filter(year_14 == "2024") %>% # June 2024, July 2024, and August 2024 are wanted
rename(year = year_14)
aut_daylight_north_2024_25 <- daylight_north_scot %>%
select(aut,year_15) %>%
filter(year_15 == "2024") %>% # September 2024, October 2024, and November 2024 are wanted
rename(year = year_15)
season_daylight_north_complete <- winter_daylight_north_2024_25 %>%
full_join(spr_daylight_north_2024_25, by = "year") %>%
full_join(sum_daylight_north_2024_25, by = "year") %>%
full_join(aut_daylight_north_2024_25, by = "year") %>%
relocate(win, .after = aut) %>%
mutate(across(spr:win, as.numeric)) %>%
pivot_longer(cols = spr:win, names_to = "season", values_to = "daylight_hrs") %>%
arrange(year, factor(season, levels = c("spr", "sum", "aut", "win"))) %>%
mutate(region = "north") %>%
filter(!is.na(daylight_hrs))
# East of Scotland
winter_daylight_east_2024_25 <- daylight_east_scot %>%
select(win,year_12) %>%
filter(year_12 == "2024") %>% # December 2024, January 2025, and February 2025 are wanted
rename(year = year_12)
spr_daylight_east_2024_25 <- daylight_east_scot %>%
select(spr,year_13) %>%
filter(year_13 == "2024") %>% # March 2024, April 2024, and May 2024 are wanted
rename(year = year_13)
sum_daylight_east_2024_25 <- daylight_east_scot %>%
select(sum,year_14) %>%
filter(year_14 == "2024") %>% # June 2024, July 2024, and August 2024 are wanted
rename(year = year_14)
aut_daylight_east_2024_25 <- daylight_east_scot %>%
select(aut,year_15) %>%
filter(year_15 == "2024") %>% # September 2024, October 2024, and November 2024 are wanted
rename(year = year_15)
season_daylight_east_complete <- winter_daylight_east_2024_25 %>%
full_join(spr_daylight_east_2024_25, by = "year") %>%
full_join(sum_daylight_east_2024_25, by = "year") %>%
full_join(aut_daylight_east_2024_25, by = "year") %>%
relocate(win, .after = aut) %>%
mutate(across(spr:win, as.numeric)) %>%
pivot_longer(cols = spr:win, names_to = "season", values_to = "daylight_hrs") %>%
arrange(year, factor(season, levels = c("spr", "sum", "aut", "win"))) %>%
mutate(region = "east") %>%
filter(!is.na(daylight_hrs))
# West of Scotland
winter_daylight_west_2024_25 <- daylight_west_scot %>%
select(win,year_12) %>%
filter(year_12 == "2024") %>% # December 2024, January 2025, and February 2025 are wanted
rename(year = year_12)
spr_daylight_west_2024_25 <- daylight_west_scot %>%
select(spr,year_13) %>%
filter(year_13 == "2024") %>% # March 2024, April 2024, and May 2024 are wanted
rename(year = year_13)
sum_daylight_west_2024_25 <- daylight_west_scot %>%
select(sum,year_14) %>%
filter(year_14 == "2024") %>% # June 2024, July 2024, and August 2024 are wanted
rename(year = year_14)
aut_daylight_west_2024_25 <- daylight_west_scot %>%
select(aut,year_15) %>%
filter(year_15 == "2024") %>% # September 2024, October 2024, and November 2024 are wanted
rename(year = year_15)
season_daylight_west_complete <- winter_daylight_west_2024_25 %>%
full_join(spr_daylight_west_2024_25, by = "year") %>%
full_join(sum_daylight_west_2024_25, by = "year") %>%
full_join(aut_daylight_west_2024_25, by = "year") %>%
relocate(win, .after = aut) %>%
mutate(across(spr:win, as.numeric)) %>%
pivot_longer(cols = spr:win, names_to = "season", values_to = "daylight_hrs") %>%
arrange(year, factor(season, levels = c("spr", "sum", "aut", "win"))) %>%
mutate(region = "west") %>%
filter(!is.na(daylight_hrs))
# Final 'season' object to be used for data merging with NHS Health Board prescription and population data
all_seasons_daylight <- season_daylight_north_complete %>%
full_join(season_daylight_east_complete) %>%
full_join(season_daylight_west_complete)
## Joining with `by = join_by(year, season, daylight_hrs, region)`
## Joining with `by = join_by(year, season, daylight_hrs, region)`
1.11 Now daylight data will be inputted into the antidepr_yearly_hb_pop object as our final data merge, where categorisation of prescription data will follow seasonal categorisation (See Season Categorisation and NHS Health Board Geographical Categorisation).
antidepr_season_hb_pop <- antidepr_yearly_hb_pop %>%
group_by(paid_date_month, region) %>%
mutate(year = as.integer(substr(paid_date_month,1,4)),
month = as.integer(substr(paid_date_month, 5, 6)),
season = case_when(
month %in% 3:5 ~ "spr",
month %in% 6:8 ~ "sum",
month %in% 9:11 ~ "aut",
month %in% c(12,1,2) ~ "win"),
season_year = if_else(month %in% c(1, 2), year - 1, year)) %>%
full_join(all_seasons_daylight, by = c("season_year" = "year", "season", "region"))
2.1 For prescription data standardisation, a calculation of items_per_1000_people will be made. This is because all NHS Health Boards have different populations, thus, comparing their “number_of_items” solely would not provide correct conclusions.
antidepr_season_hb_pop <- antidepr_season_hb_pop %>%
mutate(items_per_1000 = (number_of_items/hb_population)*1000)
2.2. Now, let’s introduce the Scottish Index of Multiple Deprivation (SMID quintiles) for each Health Board and create an analysis object called “analysis_antidepr_smid” for deprivation analysis.
smid_hb <- read_csv("https://www.opendata.nhs.scot/dataset/78d41fa9-1a62-4f7b-9edb-3e8522a93378/resource/acade396-8430-4b34-895a-b3e757fa346e/download/simd2020v2_22062020.csv") %>%
clean_names() %>%
select(c(hb, simd2020v2rank)) %>%
group_by(hb) %>%
summarise(SMID_mean_rank = mean(simd2020v2rank))
## Rows: 6976 Columns: 16
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): DataZone, IntZone, HB, HSCP, CA
## dbl (11): SIMD2020V2Rank, SIMD2020V2CountryDecile, SIMD2020V2CountryQuintile...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
analysis_antidepr_smid <- antidepr_season_hb_pop %>%
full_join(smid_hb, join_by(hbt == hb)) %>%
group_by(hbt, hb_name, season, season_year) %>%
summarise(av_daylight_hrs = mean(daylight_hrs), av_items_per_1000 = mean(items_per_1000), SMID_rank = mean(SMID_mean_rank))
## `summarise()` has grouped output by 'hbt', 'hb_name', 'season'. You can
## override using the `.groups` argument.
2.3. Clean data table to provide foundation for data visualisation and analysis for regional analysis.
analysis_antidepr_season <- antidepr_season_hb_pop %>%
group_by(region, season, paid_date_month) %>%
summarise(av_daylight_hrs = mean(daylight_hrs), av_items_per_1000 = mean(items_per_1000)) %>%
mutate(month = as.Date(paste0(paid_date_month, "01"), "%Y%m%d")) %>%
select(c(region, season, av_daylight_hrs, av_items_per_1000, month))
## `summarise()` has grouped output by 'region', 'season'. You can override using
## the `.groups` argument.
2.4. Load and merge the Health Board shapefile with previous clean data sets. Wrangle and save for geospatial analysis.
library(sf)
## Linking to GEOS 3.13.0, GDAL 3.8.5, PROJ 9.5.1; sf_use_s2() is TRUE
NHS_hb_geo <- st_read(here("docs","data", "Week6_NHS_healthboards_2019.shp")) # for geospatial data viz
## Reading layer `Week6_NHS_healthboards_2019' from data source
## `/Users/florenciasolorzano/Documents/data_science/B218332/docs/data/Week6_NHS_healthboards_2019.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 14 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 7564.996 ymin: 530635.8 xmax: 468754.8 ymax: 1218625
## Projected CRS: OSGB36 / British National Grid
analysis_geo_antidepr <- NHS_hb_geo %>%
full_join(analysis_antidepr_smid, by = join_by(HBCode == hbt)) %>%
select(-c(season_year, HBName))
3.1. Overall trends in antidepressant prescriptions and total daylight hours
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
total_daylight_analysis <- analysis_antidepr_season %>%
group_by(season, month) %>%
summarise(mean_presc = mean(av_items_per_1000), mean_daylight = mean(av_daylight_hrs)) %>%
ungroup()
## `summarise()` has grouped output by 'season'. You can override using the
## `.groups` argument.
general_analysis_line <- total_daylight_analysis %>%
arrange(month) %>%
plot_ly() %>%
add_segments(
x = ~month[season != lag(season, default = first(season))],
xend = ~month[season != lag(season, default = first(season))],
y = 0,
yend = max(total_daylight_analysis$mean_daylight),
line = list(color = "gray", dash = "dash"),
showlegend = FALSE) %>%
add_lines(
x = ~month, y = ~mean_daylight,
name = "Daylight Hours", line = list(color = "moccasin")) %>%
add_lines(
x = ~month, y = ~mean_presc,
name = "Items per 1000", yaxis = "y2", line = list(color = "blue")) %>%
layout(
title = "Daylight Hours and Antidepressant Prescriptions in Scotland",
yaxis = list(title = "Daylight Hours"),
yaxis2 = list(title = "Items per 1,000", overlaying = "y", side = "right"),
xaxis = list(
title = "Month",
type = "date",
tickmode = "linear",
dtick = "M1",
tickangle = -45))
general_analysis_line
3.2 Composite graph showing standardised antidepressant prescription rates proportional to NHS Health Board population and average daylight hours per month from March 2024 to March 2025, faceted by Scottish Geographical Region (North, East, and West)
library(ggplot2)
library(ggtext)
regional_analysis_composite <- analysis_antidepr_season
ggplot(analysis_antidepr_season, aes(x = month)) +
geom_col(aes(y = av_daylight_hrs), fill = "skyblue", alpha = 0.6) +
geom_line(
aes(y = av_items_per_1000, color = "Antidepressant prescriptions per 1000 people"),
linewidth = 1.2, group = 1) +
facet_wrap(~ region, ncol = 3, scales = "free_x") +
scale_y_continuous(
name = "Average Daylight Hours (hrs)",
sec.axis = sec_axis(~ ., name = "Average Antidepressant prescriptions (units/1000 people)")) +
scale_x_date(
date_labels = "%b\n%Y",
breaks = "1 month",
expand = c(0.01, 0.01)) +
scale_color_manual(values = c("Antidepressant prescriptions per 1000 people" = "darkred")) +
labs(
title = "Antidepressant prescriptions and Average Daylight hours from March 2024 - March 2025 by Scottish Geographical Region",
x = "Month",
caption = "Bars = Daylight Hours | Line = Antidepressant Prescriptions") +
theme_minimal(base_size = 13) +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "top",
legend.title = element_blank(),
strip.background = element_rect(fill = "gray90", color = NA),
strip.text = element_text(face = "bold", size = 12),
plot.title = element_textbox_simple(),
plot.margin = margin(t = 20, r = 10, b = 20, l = 10, unit = "pt"))
ggsave("regional_analysis_composite.png",width = 20, height = 20, units = "cm")
regional_analysis_composite
## # A tibble: 36 × 5
## # Groups: region, season [12]
## region season av_daylight_hrs av_items_per_1000 month
## <chr> <chr> <dbl> <dbl> <date>
## 1 east aut 269. 110. 2024-09-01
## 2 east aut 269. 119. 2024-10-01
## 3 east aut 269. 112. 2024-11-01
## 4 east spr 362. 109. 2024-03-01
## 5 east spr 362. 111. 2024-04-01
## 6 east spr 362. 118. 2024-05-01
## 7 east sum 453. 109. 2024-06-01
## 8 east sum 453. 115. 2024-07-01
## 9 east sum 453. 117. 2024-08-01
## 10 east win 164. 121. 2024-12-01
## # ℹ 26 more rows
3.3. Geospatial analysis of prescriptions per season per Health Board
library(patchwork)
map_antidepr_seasons <- analysis_geo_antidepr %>%
ggplot() +
geom_sf(aes(fill = av_items_per_1000), size = 0.1, colour = "darkgrey")+
scale_fill_distiller(
palette = "Blues",
direction = 1
) +
facet_wrap(~season, nrow = 1)+
labs(title = "Seasonal Antidepressant Prescriptions (March 2024 - February 2025)", subtitle = "Prescriptions by Scottish Health Board per 1000 people", fill = "") +
theme_void()+
theme(
plot.title = element_text(face = "bold", size = 10),
plot.subtitle = element_text(size = 9),
legend.title = element_text(face = "bold", size = 10)
)
map_daylight_seasons <- analysis_geo_antidepr %>%
ggplot() +
geom_sf(aes(fill = av_daylight_hrs), size = 0.1, colour = "darkgrey")+
scale_fill_distiller(
palette = "Oranges",
direction = 1
) +
facet_wrap(~season, nrow = 1)+
labs(title = "Seasonal Total Daylight Hours (March 2024 - February 2025)", fill = "") +
theme_void()+
theme(
plot.title = element_text(face = "bold", size = 10),
legend.title = element_text(face = "bold", size = 10)
)
map_antidepr_daylight <- map_antidepr_seasons / map_daylight_seasons +
plot_layout(heights = c(1,1))
map_antidepr_daylight
3.4. Geospatial analysis of SMID quintiles and prescription analysis
map_smid <- analysis_geo_antidepr %>%
ggplot() +
geom_sf(aes(fill = av_items_per_1000,
text = paste0(
"<b>NHS Board:</b> ", hb_name, "<br>",
"<b>Season:</b> ", season, "<br>",
"<b>Prescriptions per 1000:</b> ",
round(av_items_per_1000, 1), "<br>",
"<b>SMID Rank:</b> ", SMID_rank)),
colour = "grey20", size = 0.2) +
scale_fill_distiller(palette = "Blues", direction = 1) +
facet_wrap(~ season) +
labs(
title = "Antidepressant Prescriptions & SMID Rank per NHS HealthBoard",
fill = "Antidepressant Prescriptions per 1000") +
theme_void() +
theme(
plot.title = element_text(face = "bold", size = 14),
strip.text = element_text(face = "bold"))
## Warning in layer_sf(geom = GeomSf, data = data, mapping = mapping, stat = stat,
## : Ignoring unknown aesthetics: text
map_smid_interactive <- ggplotly(map_smid, tooltip = "text")
map_smid_interactive
# Bivariate map
library(cowplot)
##
## Attaching package: 'cowplot'
## The following object is masked from 'package:patchwork':
##
## align_plots
## The following object is masked from 'package:gt':
##
## as_gtable
## The following object is masked from 'package:lubridate':
##
## stamp
biv_bins <- analysis_geo_antidepr %>%
mutate(prescr_bin = ntile(av_items_per_1000,3),
smid_bin = ntile(SMID_rank, 3),
biv_class = paste0(prescr_bin, "-", smid_bin))
biv_palette <- c(
"1-1" = "#e8e8e8",
"2-1" = "#b8d6be",
"3-1" = "#64acbe",
"1-2" = "#d4b9da",
"2-2" = "#a5add3",
"3-2" = "#4a7bb7",
"1-3" = "#c994c7",
"2-3" = "#df65b0",
"3-3" = "#dd1c77")
biv_matrix <- expand.grid(prescr_bin = 1:3, smid_bin = 1:3) %>%
mutate(smid_bin = 4 - smid_bin, biv_class = paste0(prescr_bin, "-", smid_bin))
biv_legend <- biv_matrix %>%
ggplot(aes(x = prescr_bin, y = smid_bin, fill = biv_class)) +
geom_tile(color = "white") +
scale_fill_manual(values = biv_palette, guide = "none") +
scale_y_continuous(breaks = 1:3, labels = c("Low", "", "High")) +
scale_x_continuous(breaks = 1:3, labels = c("Low", "", "High")) +
labs(
x = "Prescriptions",
y = "SMID Rank") +
coord_fixed(ratio = 1)+
theme_minimal(base_size = 9) +
theme_void()+
theme(
axis.title = element_text(size = 8, face = "bold"),
axis.text = element_text(size = 6),
panel.grid = element_blank(),
plot.margin = margin(t = 0, r = 5, b = 0, l = 0))
map_biv <- biv_bins %>%
ggplot() +
geom_sf(aes(fill = biv_class), size = 0.1, colour = "white") +
scale_fill_manual(values = biv_palette, guide = "none") +
facet_wrap(~season, nrow = 1) +
labs(
title = "Bivariate Map showing Antidepressant prescription and SMID ranking",
subtitle = "Prescriptions per 1000 people across Scottish NHS Healthboard",
fill = "Bivariate classification") +
coord_sf(expand = FALSE) +
theme_void()+
theme(
plot.title = element_text(face = "bold", size = 14),
plot.subtitle = element_text(size = 10),
panel.spacing = unit(0.1, "lines"),
strip.text = element_text(face = "bold", size = 8),
plot.margin = margin(t = 0, r = 10, b = 0, l = 10, unit = "pt"))
final_map_biv <- plot_grid(map_biv, biv_legend,
ncol = 2,
rel_widths = c(4, 1),
align = "t")
final_map_biv
# Interactive scatter plot
smid_daylight_scatter <-plot_ly(
data = analysis_antidepr_smid,
x = ~SMID_rank,
y = ~av_items_per_1000,
color = ~hb_name,
text = ~paste0(
"<b>NHS Board:</b> ", hb_name, "<br>",
"<b>Season:</b> ", season, "<br>",
"<b>Prescriptions per 1000:</b> ",
round(av_items_per_1000, 1), "<br>",
"<b>SMID Rank:</b> ", SMID_rank),
hoverinfo = "text",
type = "scatter",
mode = "markers",
marker = list(size = 10, opacity = 0.8)) %>%
layout(
title = list(text = "Seasonal Antidepressant Prescriptions and Health Board Deprivation rankings<sup><br> Prescriptions per 1000 people</sup>", x = 0.5, xanchor = "center"),
xaxis = list(title = "SMID Rank (Higher = Less Deprived)"),
yaxis = list(title = "Antidepressant Prescriptions (unit/per 1000)"),
margin = list(t = 90, r = 100, b = 85, l = 85))
smid_daylight_scatter
## Warning in RColorBrewer::brewer.pal(max(N, 3L), "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(max(N, 3L), "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors